Learning Phrase-Based Spelling Error Models from Clickthrough Data
نویسندگان
چکیده
This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems.
منابع مشابه
Learning to Rank with Attentive Media Attributes
In the context of media search engines where assets have small textual data available, we explore several models that improve the learning to rank use cases. In particular, we propose a model with an attention mechanism that leverages phrase-based attributes to guide the importance of other keyword-based attributes. We train these models with clickthrough data from Adobe Stock search queries an...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملWCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data
In this paper we present WCL2R, a benchmark collection for supporting research in learning to rank (L2R) algorithms which exploit clickthrough features. Differently from other L2R benchmark collections, such as LETOR and the recently released Yahoo!’s collection for a L2R competition, in WCL2R we focus on defining a significant (and new) set of features over clickthrough data extracted from the...
متن کاملRobust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge
This article presents a robust probabilistic method for the detection of context-sensitive spelling errors. The algorithm identifies lessfrequent grammatical constructions and attempts to transform them into more-frequent constructions while retaining similar syntactic structure. If the transformations result in lowfrequency constructions, the text is likely to contain an error. A first unsuper...
متن کاملFundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique
This paper proposes a novel multi-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term, intonational phrase-level, component and short-term, accentual phrase-level, component, along with a least-squares error criterion that includes a re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010